-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache #31652
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #135475 has finished for PR 31652 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
@@ -879,7 +879,7 @@ class Analyzer(override val catalogManager: CatalogManager) | |||
|
|||
private def unwrapRelationPlan(plan: LogicalPlan): LogicalPlan = { | |||
EliminateSubqueryAliases(plan) match { | |||
case v: View if v.isDataFrameTempView => v.child | |||
case v: View if v.isTempViewStoringAnalyzedPlan => v.child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed because this type of view can also be created not from a dataframe - e.g., ALTER VIEW ... AS
with spark.sql.legacy.storeAnalyzedPlanForView
set to true.
// If the plan cannot be analyzed, throw an exception and don't proceed. | ||
val qe = session.sessionState.executePlan(query) | ||
qe.assertAnalyzed() | ||
val analyzedPlan = qe.analyzed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query
will be an analyzed plan since AlterViewAs
is converted to AlterViewAsCommand
in Analyzer phase (ResolveSessionCatalog
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we remove the similar code in CreateViewCommand
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot yet because of the following:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala
Lines 96 to 105 in f340857
override lazy val planToCache: LogicalPlan = { | |
Dataset.ofRows(sparkSession, | |
CreateViewCommand( | |
name = TableIdentifier(tempViewName), | |
userSpecifiedColumns = Nil, | |
comment = None, | |
properties = Map.empty, | |
originalText = Some(originalText), | |
child = query, | |
allowExisting = false, |
Here, query
is not resolved yet. I will get to this though.
@@ -116,7 +116,7 @@ case class CreateViewCommand( | |||
if (viewType == LocalTempView) { | |||
val aliasedPlan = aliasPlan(sparkSession, analyzedPlan) | |||
if (replace && needsToUncache(catalog.getRawTempView(name.table), aliasedPlan)) { | |||
logInfo(s"Try to uncache ${name.quotedString} before replacing.") | |||
logDebug(s"Try to uncache ${name.quotedString} before replacing.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to logDebug
because the permanent view was using it:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
Line 174 in 9ec8696
logDebug(s"Try to uncache ${viewIdent.quotedString} before replacing.") |
If logInfo
is better, I can change these to logInfo
instead.
checkCyclicViewReference(analyzedPlan, Seq(name), name) | ||
|
||
logDebug(s"Try to uncache ${name.quotedString} before altering.") | ||
CommandUtils.uncacheTableOrView(session, name.quotedString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Uncaching seems like a right thing to do when you alter a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I think so, since the cached data is stale. It's similar to CREATE OR REPLACE TEMP VIEW, seems we have an extra check needsToUncache
. Shall we do it here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the logic is very similar, I refactored this part to share code with CreateViewCommand
.
Test build #135849 has finished for PR 31652 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
Outdated
Show resolved
Hide resolved
checkCyclicViewReference(analyzedPlan, Seq(name), name) | ||
|
||
logDebug(s"Try to uncache ${name.quotedString} before altering.") | ||
CommandUtils.uncacheTableOrView(session, name.quotedString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the logic is very similar, I refactored this part to share code with CreateViewCommand
.
rawTempView: LogicalPlan, | ||
aliasedPlan: LogicalPlan): Boolean = rawTempView match { | ||
// If TemporaryViewRelation doesn't store the analyzed view, always uncache. | ||
case TemporaryViewRelation(_, None) => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary because case p => !p.sameResult(aliasedPlan)
should cover, but I added this case explicitly to make the intention clear.
needsToUncache(r, aliasedPlan) | ||
}.getOrElse(false) | ||
if (replace && uncache) { | ||
logInfo(s"Try to uncache ${name.quotedString} before replacing.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logDebug?
* If `userSpecifiedColumns` is defined, alias the analyzed plan to the user specified columns, | ||
* else return the analyzed plan directly. | ||
*/ | ||
def aliasPlan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we still keep this in CreateViewCommand
and let it pass the aliasedPlan
to createTemporaryViewRelation
? This logic is not needed for AlterViewAsCommand
.
// Do not need to uncache if the to-be-replaced temp view plan and the new plan are the | ||
// same-result plans. | ||
case TemporaryViewRelation(_, Some(p)) => !p.sameResult(aliasedPlan) | ||
case p => !p.sameResult(aliasedPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need it? rawTempView
should always be TemporaryViewRelation
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to first update CREATE TEMP VIEW USING
:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala
Lines 93 to 100 in 3a299aa
val viewDefinition = Dataset.ofRows( | |
sparkSession, LogicalRelation(dataSource.resolveRelation())).logicalPlan | |
if (global) { | |
catalog.createGlobalTempView(tableIdent.table, viewDefinition, replace) | |
} else { | |
catalog.createTempView(tableIdent.table, viewDefinition, replace) | |
} |
, which I will get to right after this PR.
Once the PR is done, we can update createTempView
to take in TemporaryViewRelation
(https://github.com/apache/spark/pull/31273/files#r580757641), then I can safely remove this line.
Let me actually create a JIRA to capture these as subtasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created https://issues.apache.org/jira/browse/SPARK-34698 to track these.
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
Outdated
Show resolved
Hide resolved
Test build #135921 has finished for PR 31652 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #135946 has finished for PR 31652 at commit
|
thanks, merging to master! |
…d take/return more concrete types ### What changes were proposed in this pull request? Now that all the temporary views are wrapped with `TemporaryViewRelation`(#31273, #31652, and #31825), this PR proposes to update `SessionCatalog`'s APIs for temporary views to take or return more concrete types. APIs that will take `TemporaryViewRelation` instead of `LogicalPlan`: ``` createTempView, createGlobalTempView, alterTempViewDefinition ``` APIs that will return `TemporaryViewRelation` instead of `LogicalPlan`: ``` getRawTempView, getRawGlobalTempView ``` APIs that will return `View` instead of `LogicalPlan`: ``` getTempView, getGlobalTempView, lookupTempView ``` ### Why are the changes needed? Internal refactoring to work with more concrete types. ### Does this PR introduce _any_ user-facing change? No, this is internal refactoring. ### How was this patch tested? Updated existing tests affected by the refactoring. Closes #31906 from imback82/use_temporary_view_relation. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR proposes the following:
AlterViewAs.query
is currently analyzed in the physical operatorAlterViewAsCommand
, but it should be analyzed during the analysis phase.spark.sql.legacy.storeAnalyzedPlanForView
is set to true, storeTermporaryViewRelation
which wraps the analyzed plan, similar to [SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase #31273.Why are the changes needed?
Analyzing a plan should be done in the analysis phase if possible.
Not uncaching the view (existing behavior) seems like a bug since the cache may not be used again.
Does this PR introduce any user-facing change?
Yes, now the view can be uncached if it's already cached.
How was this patch tested?
Added new tests around uncaching.
The existing tests such as
SQLViewSuite
should cover the analysis changes.